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Abstract 

Any single-cell-resolved measurement generates a population distribution of phenotypes, characterized by a mean, a 
variance, and a shape. Here we show that changes in the shape of a phenotypic distribution can signal perturbations to 
cellular processes, providing a way to screen for underlying molecular machinery. We analyzed images of a Drosophila S2R-I- 
cell line perturbed by RNA interference, and tracked 27 single-cell features which report on endocytic activity, and cell and 
nuclear morphology. In replicate measurements feature distributions had erratic means and variances, but reproducible 
shapes; RNAi down-regulation reliably induced shape deviations in at least one feature for 1 072 out of 71 31 genes surveyed, 
as revealed by a Kolmogorov-Smirnov-like statistic. We were able to use these shape deviations to identify a spectrum of 
genes that influenced cell morphology, nuclear morphology, and multiple pathways of endocytosis. By preserving single- 
cell data, our method was even able to detect effects invisible to a population-averaged analysis. These results demonstrate 
that cell-to-cell variability contains accessible and useful biological information, which can be exploited in existing cell- 
based assays. 
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Introduction 

Advances in labeling and imaging have made it possible to 
collect quantitative information on a variety of cellular processes at 
single-cell resolution, and to generate high-quality population 
distribution data. Cell-to-cell variability is evident in any such 
measurement [1], [2]. This variability plays an essential role in 
processes ranging from bet-hedging in unicellular organisms, to 
cell differentiation, ageing, and infection in metazoa [3]-[7]. Cell- 
to-cell variability can be generated by intrinsic stochastic 
mechanisms and shaped by regulatory molecular circuitry: 
transcriptional regulation impacts protein expression distributions 
in bacteria and yeast [8]-[10]; the molecular machinery control- 
ling cell shape generates morphological variability in metazoan 
cells [11], [12]; DNA-damage checkpoints produce noisy oscilla- 
tions in cancer cell lines [13]. A corollary of these results is that the 
entire population distribution of a phenotype can be used to study 
the underlying biology. Here we present an explicit demonstration 
of this idea: rather than investigating how specific molecular 
mechanisms generate variability, we reverse the process and use 
variability itself as a general probe of those mechanisms. We first 
show that genetic perturbations reliably cause changes to the 
population distributions of a variety of phenotypes related to cell 
morphology and activity. We then demonstrate how such changes 
in phenotypic distributions can be exploited, in conjunction with a 
screening approach such as genome-wide RNA interference 



(RNAi), to probe cellular processes with unprecedented sensitivity. 
An application of this method, to study the molecular basis of 
multiple endocytic pathways in metazoan cells, is reported in a 
companion paper [14]. 

Image-based RNAi screens have previously been used to 
understand the molecular basis of diverse cellular processes, 
including: pathogen entry mechanisms; intracellular traffic; cell 
motility, growth, and differentiation; and cell death and aging 
[15]-[18]. Though these screens often collect data at single-cell 
resolution [19]-[24], they invariably focus on population-averaged 
values to select hits, and rely on heuristic normalization techniques 
to compensate for labeling and imaging artifacts which cause these 
values to be erratic [25], [26]. Here we use a radically different 
strategy, which paradoxically relies on the occurrence of ceU-to-ceU 
variability: we focus purely on the shapes of population 
distributions which, as we show, are robust against measurement 
artifacts. The shapes of these distributions change under RNAi 
down-regulation; we quantify these changes, and thus identify 
genes which influence various cellular phenotypes. This shape- 
based strategy complements existing methods for analyzing image- 
based screens. It can be directiy applied to any perturbation 
experiment which generates single-cell data, for a variety of 
phenotypes, and is able to identify subde hits which are missed 
using standard approaches. 
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Figure 1. An image-based RNAi screen for endocytic and cell morphological features. (A) A single Drosophila S2R+ cell, fixed and imaged 
at 20X and 0.75 NA; the scale bar is 3 |im. FITC-Dextran (green) labels the GEEC pathway responsible for fluid-phase uptake; Alexa568-Transferrin (red) 
labels the clathrin-dependent receptor-mediated endocytic pathway; the Alexa647-Okt9 antibody (blue) labels steady-state cell surface levels of the 
Transferrin receptor; the nucleus (not shown) was imaged with a Hoechst stain. Region 1: pure GEEC endosome; Region 2: pure Transferrin 
endosome; Region 3: colocalization signature, marking a heterotypic fusion product between the two types of endosomes; Region 4: surface cluster 
of Okt9. (B) Schematic representation of the cell from Figure 1A. (C) Schematic representation of intensity features (orange), which track the cell- 
averaged intensity of the various fluorescent labels; and geometric features (purple), which track the sizes and shapes of the cell, of the nucleus, and 
of endosomes. (D) 27 single-cell features. The two rows correspond to intensity and geometric features; each column relates to individual endocytic 
pathways or cell-morphological features. See Table SI in File SI for detailed feature descriptions. (E) The screen was carried out on glass slides printed 
with 300 wells in a 10x30 format, each well containing dsRNA targeted against different genes. Colors represent negative (black) or positive (blue) 
control wells, while white represents test wells. Details of the image analysis and the experimental conditions are provided in the companion paper 
[14]. 

doi:1 0.1 371 /journal.pone.0090540.g001 



Results 

Single-cell-resolved features from an image-based RNAi 
screen 

We applied these ideas in the context of an image-based RNAi 
screen, with tlie goal of identifying molecular machinery involved in 
multiple pathways of metazoan endocytosis. The results of this 
endocytic screen are presented in the companion paper [14]; here 
we focus on the use of ceU-to-cell variability as a general cell- 
biological probe. We simultaneously tracked two endocytic path- 
ways [27] in Drosophila S2R-H cells using a pulse-labeling assay [28] : 
the clathrin- and dynamin-independent CLIC/GEEC endocytic 
pathway [29] responsible for fluid-phase uptake (probed using 
FITC-conjugated dextran; green, Fig. 1A,B); and the canonical 
clathrin-dependent pathway [30] responsible for receptor-mediated 
endocytosis (probed by using Alexa568-conjugated Transferrin, 
which is taken up by an ectopicaUy expressed human Transferrin 
receptor [28]; red, Fig. 1A,B). We used fluorescence microscopy and 



automated image analysis to extract 27 features for each cell (see 
Table SI in File SI, and companion paper [14] for further details): 
12 intensity-dependent features describing total uptake levels of the 
two endocytic probes, and surface levels of the Transferrin receptor 
as labeled by the monoclonal antibody Okt9 [28] (orange cartoon, 
and 11-112; Fig. 1C,D); and 15 geometric features quantifying the 
shape, size and number of endocytic compartments, as well as 
nuclear and cell morphology (purple cartoon, and G1-G15; Fig, 
1C,D). The screen was performed on custom-designed glass slide 
arrays of 300 wells printed with double-stranded RNA (dsRNA) 
(Fig. IE): 30 wells were negative controls (15 with no dsRNA and 15 
with dsRNA targeting the gene for zeocin resistance, which is absent 
in the Drosophila genome; black in Fig. ID); 8 wells were positive 
controls (dsRNA targeting Shibire [31] for the receptor-mediated 
pathway, and dsRNA targeting Sec23 and Arfl for the fluid-phase 
GEEC pathway [28]; shades of blue in Fig. ID); the remaining wells 
contained dsRNA targeting individual genes to be screened. Each 
slide was assayed in triplicate, with scrambled dsRNA patterns (Fig, 
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Figure 2. Well-to-well and cell-to-cell variability. (A,B) Population distributions (histograms) of features 13 (A) and G3 (B), for three negative 
control wells from a single slide. Hollow bars show raw distributions; solid bars show the data when distributions are normalized to have zero mean 
and unit variance. (C,D) Heat maps of population-averaged mean values for features 13 (C) and G3 (D). The positional effects in Figure 2C likely arise 
from labeling and imaging artifacts. (E) ANOVA F-statistic for inter-row variance versus within-row variance of distribution means (x-axis) or 
skewnesses (y-axis), for all 27 features. Each point shows the median F-statistic over 84 slides; intensity features are colored orange, geometric 
features are colored purple. Data for each slide are shown in Figure S1B,C in File SI. (F) Cumulative distributions of feature 13, from a negative control 
well (grey) and a positive control well (Arfl; blue). The left panel shows raw data; the right panel shows that cumulative distributions are still 
distinguishable after normalization. 
doi:1 0.1 371 /journal.pone.0090540.g002 



SIA in File SI). Cell.s were grown for four days in the presence of 
dsRNA, then fixed and imaged. We tested a total of 7216 dsRNAs 
targeting 7131 unique genes. 

Cell-to-cell and well-to-well variability in feature 
distributions 

We measured phenotypic distributions of the 27 single-cell 
features for each well (hollow histograms, Fig. 2A,B), with 
population sizes ranging from 200 to 500 cells. All the distributions 
we measured showed significant cell-to-cell variability; but we also 
saw a great deal of well-to-well variability between replicate 
measurements. Even among negative controls, feature distribu- 
tions were erratic, though geometric features were typically more 
robust than intensity features (compare hollow purple histograms 



in Fig. 2B to hollow orange histograms in Fig. 2A) with the latter 
showing significant slide-positional artifacts (orange heat map. Fig. 
2C). To further quantify this effect we carried out 1-way ANOVA 
[32] for two scale-related measures (mean and variance) and two 
drmensionless shape-related measures (skewness and kurtosis) of 
each distribution, using the ANOVA F-statistic to compare the 
variability of these measures between and within the rows of each 
slide. This analysis confirmed (Fig. 2E) that the mean and variance 
of intensity features (but not of geometric features) were susceptible 
to row-dependent artifacts, but shape-related measures for all 
features less so. The same trends were observed for variability 
between columns of a slide and between entire slides, and between 
negative controls on different slides, across all intensity and 
geometry features (Fig. S1B,C in File SI). 
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Figure 3. Calculating Z-scores to quantify shape changes. (A) A 

heatmap of all pair-wise KS-statistics for all 300 wells on a 
representative slide (columns) against negative controls (rows), for 
feature 13. The genes have been sorted horizontally, starting with 20 
selected negatives (black), test genes (grey), positives (blue), and 10 
unused negatives (green). Lighter boxes indicate higher shape 
deviations; the negatives tested against themselves show the lowest 
scores (dark sections on the left and right), while the positives show the 
highest scores (white vertical stripe). A potential hit (red triangle) shows 
up as a white vertical line. (B) Histogram of Z-scores for the slide 
depicted in (A), with test genes, positives and unused negatives plotted 
separately. Potential hits are marked by red triangles. The final selection 
of a hit depends on how a gene performs in triplicate. 
doi:1 0.1 371 /journal.pone.0090540.g003 

Based on these results, we surmised that the well-to-well 
differences in the mean and variance of intensity features had 
their origin in background (additive) and scale (multiplicative) 
artifacts (from dye loading or illumination, for example) which 
don't influence geometric measurements. These artifacts should 
neutralized by the standard procedure of subtracting the mean 
and dividing by the standard-deviation, leaving a distribution that 
is characterized by its shape alone. Indeed, phenotypic distribu- 
tions from replicate measurements converged to nearly identical 
shapes once normalized in this way, for both intensity and 
geometry features (filled histograms, Fig. 2A,B). However, if the 
feature axis was re-scaled non-linearly before normalization (such 
as by a logarithmic or power-law transformation), geometric 
distributions typically converged while intensity distributions did 



not. This provided further evidence that normalization was 
working to counter affme (additive and multiplicative) artifacts. 

Feature distributions change shape under perturbations 

Our key observation was that positive and negative control wells 
could be distinguished even after normalizing out mean and 
variance (Fig. 2F); the resulting distributions are characterized 
entirely by their shapes. We used a Kolmogorov-Smirnov-like (KS) 
statistic [32], [33] to assign a Z-score to each gene on a slide (Fig. 
3; Methods: A Z-score to quantify shape changes of phenotypic 
distributions). This Z-score quantifies distribution shape changes 
between test and negative control wells; the higher the score, the 
greater the shape deviation. Since each gene was tested in 
triplicate, we calculated three Z-scores for every gene, and pooled 
these data over the entire screen. We classified a gene as a hit if it 
occurred two or more times above a given Z-score threshold (Fig. 
3B). Figure 4A shows the number of hits selected from the screen 
(green curve) compared to the number of hits selected from 
randomly permuted genes (grey band) as the Z-score threshold is 
varied. The deviation of the green curve from the grey band 
reveals the presence of reproducible hits in the dataset. The 
maximal deviation occurs near a Z-score threshold of 3 across all 
features (Fig. S2A in File SI). Using this threshold we identified 
1072 unique genes as hits for one or more features [14]. 

Shape-based scoring reveals a spectrum of weak-to- 
strong genetic contributions 

The response to perturbations over tripUcate measurements 
revealed an unexpectedly complex connection between genes and 
phenotypes. For each feature, at a given Z-score threshold we can 
classify genes into four bins: those occurring either zero, one, two, 
or three times above threshold. Each of these bins will contain a 
combination of true hits and negatives. We can infer false-positive 
rates (FP: the fraction of above-threshold negatives; Fig. 4B, top 
panel) and true-positive rates (TP: the fraction of above-threshold 
hits; Fig 4B, bottom three panels) by fitting the observed gene 
number in each bin to a statistical model (Fig. 4C; Methods: 
Assessing statistical power from triplicate data). The inferred FP 
rates matched well to the FP rates measured for negative controls 
(Fig. 4D). However, we were not able to infer a single TP rate 
consistent with the data. The behavior of the positive controls 
highlights the problem: different genes appear to have different, 
characteristic TP rates (Fig. 4B). 

Extending this idea, we postulated that hits over the entire 
screen had a distribution of TP rates. Under this assumption, it was 
possible to fit all the observed data, and therefore to infer FP rates 
(Fig. 4E, x-axes) and the range of TP rates (Fig. 4E, green band 
along y-axes) as the Z-score threshold was varied. For most 
features, a Z-score threshold of 3 corresponds to FP < 0. 1 and TP 
~ 0.5 for single measurements; the FP rate is lower and the TP 
rate higher if we use triplicate data with a 2/3 rule (Fig. 4C). In 
support of our calculation, measured TP rates from different 
positive controls fell within the inferred band of TP rates (Fig. 4E, 
solid lines; Fig. S3 in File SI). The broad distribution of TP rates is 
a property of the underlying biology, related to the varying degrees 
of influence different genes can have on the phenotype of interest. 

Shape-based scoring outperforms other scoring methods 

There are many possible variations of the KS-based Z-score, 
differing on how the phenotypic distributions are initially 
modified. We compared three options: (U) Un-normalized or 
raw distributions; (P) Partially normalized distributions, trans- 
formed to have the average mean and variance of nearest 
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Figure 4. Statistical performance of shape-based scoring. (A) The number of genes that occur two or more times above each Z-score 
threshold, for three representative features. The green curve shows the number of genes selected from the screen; the grey band represents the 
upper and lower limits of number of genes selected from 1 000 randomly permuted datasets. We used a Z-score cutoff of 3 (red line) to select hits. (B) 
For feature 13, distribution of Z-scores for negative control wells (top panel), and positive control wells (second panel: Shibire; third panel: Arfl; fourth 
panel: Sec23). At a given Z-score threshold (red line), the false-positive rate (FP = a) is the fraction of negatives above threshold (solid grey bars), 
while true-positive rate (TP = 1-/J) is the fraction of positives above threshold (hollow blue bars). (C) For feature 13, the upper left panel shows the 
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fraction of genes occurring zero, one, two, or three times above a Z-score threshold of 3; circles show actual data, bars show the inferred composition 
of each bin, in terms of positives (green) and negatives (grey). The lower-left panel shows the fraction of positives in each bin; genes occurring two or 
more times above threshold are strongly enriched in positives. The two right panels show the performance when hits are selected from a single 
measurement rather than using triplicates. (D) The grey band shows the range of inferred FP rates for 25 features (excepting features G1 1 and G1 5 for 
which the inference procedure fails to converge); the black line shows the mean of the measured FP rates for the same features. (E) Inferred TP rates. 
Green bands show the range of inferred true positive rates (l-j?o ± c; see Methods: Assessing statistical power from triplicate data) as a function of 
inferred false positive rates (a); blue lines show the observed TP and FP rates among positive control genes (light: Shibire; medium: Sec23; dark: Arfl). 
The box to the right of each graph shows the inferred average TP rate 0-lSo) at FP = 0.1; the solid dot shows the performance using normalized 
distributions, the hollow dot shows the performance using un-normalized distributions. 
doi:l 0.1 371/journal.pone.0090540.g004 



neighbors; (N) Normalized distributions, transformed to have zero 
mean and unit variance. We evaluated the observed true positive 
rate (assuming aU positive controls are hits for all features), and the 
inferred mean true positive rate (from triplicate data) at a false 
positive rate FP = 0.1, and calculated the improvements in 
performance between different methods: TP^ - TPu and TFy - 
TPp. The values for 25 features (excepting Gil and G15 for which 
triplicate inference failed) are binned into histograms in Figure 5A; 
the bar graphs indicate the fraction of intensity (orange) or 
geometric (purple) features in each bin; left panels show observed 
improvements for positive controls, right panels show inferred 
improvements for all genes. We find that for intensity features, 
which are plagued by measurement artifacts, normalization 
actually increases the power of the screen; for geometric features, 
which are less susceptible to artifacts, our method performs at least 
as well using normalized distributions as using raw distributions. 
Specifically: at FP = 0.1, the inferred average TP rate with 
normalization (that is, using shape alone; Fig. 4E, solid green dots) 
exceeds that without normalization (Fig. 4E, hollow green dots) for 
all intensity features and some geometric features (Fig. 5, top 
panels). We did however find that normalization typically does not 
improve or harm performance when applied to positive controls 
(Fig. 4, left panels) which are associated with strong perturbations. 

We next compared the performance of the KS-based Z-score 
with that of a Z-score where the 'signal' is restricted only to the 
mean values of feature distributions. This 'traditional' Z-score is 
defined as the squared difference between the signal of a given well 
and the average signal of the wells on a given slide, normalized by 
the variance of all the signals. As before, we can use triplicate data 
with a 2/3 rule to select hits; Figure S2B in File SI shows the 
number of hits selected using the traditional Z-score (green curve) 
compared to the number of hits selected from randomly permuted 
genes (grey band). Unlike the KS-based Z-score (Fig. S2A in File 
SI), at no threshold does the number of hits selected using the 
traditional Z-score rise above that expected by chance. For further 
validation, we screened a subset of predicted hits using an 
independent experimental assay. This validation assay was 
designed to minimize positional artifacts at the expense of lower 
throughput, so the mean values of feature distributions could be 
used to select hits [14]. We found that for both intensity and 
geometric features, less than 10% of the traditional Z-scores of 
validated genes exceeded unity in absolute value; most of these hits 
would have been completely missed by the high-throughput 
screen. Conversdy, there was a strong overlap between genes 
selected by shape-based scoring with those selected from the 
mean-value analysis in the validation assay; that is, the hits which 
had been selected only for their ability to modify the shapes of 
phenotypic distributions had a strong tendency to perturb the 
means of those distributions as well (Fig. 6A). 

Biological relevance of shape-based scoring 

We applied three independent criteria to test whether our 
shape-based scoring strategy generates biologically meaningful 
information. First, we examined whether our hit selection might 



have been influenced by overall cell health and proliferation, using 
cell density as a proxy. We found that the cell densities observed in 
negative control wells broadly overlapped with those in test wells 
and in wells corresponding to hits. This suggests that the observed 
changes in the shapes of phenotypic distributions do not arise from 
gross deficiencies in cell health due to RNA interference. 
However, there were subtle differences in cell density between 
hit subsets (Fig. 6B): the number of cells per field among fluid and 
Transferrin uptake hits (medians 90 and 92, respectively) were 
only slightiy below those among negative control wells (median 
98), btit nuclear hits showed a striking correlation with high cell 
number (median 139). This suggests an unexpected connection 
between nuclear morpholog)' and proliferative capacity, though 
this must be further explored to rule out confounding factors. 
Second, we checked whether individual genes influenced multiple 
types of features in the expected manner. Of the 1072 hits, 470 
influenced fluid-phase uptake, 602 influenced Transferrin uptake, 
267 influenced nuclear morphology, and 26 influenced cell size 
(Fig. 6C). Consistent with the expectation that different endocytic 
pathways share core molecular macliinery, there was a high 
degree of overlap between the fluid and Transferrin uptake gene 
sets (27%: 211 genes compared to 34 expected by chance). In 
contrast, there was no significant overlap between endocytic hits 
and those influencing nuclear morphology (5%: 56 genes 
compared to 31 expected by chance). Third, we asked whether 
the lists of hits were enriched for protein complexes, as defined by 
the Gene Ontology [34] 'cellular component' classification system 
(Fig. 6D). Protein complexes involved in basic cellular processes 
such as transcription, mRNA processing, and proteolysis, all 
emerged as strong hits; we also found components of the 
cytoskeletal and traffic machinery enriched among endocytic hits. 
These complexes are expected to play a pervasive role in cellular 
processes; what is surprising is that they can be detected through 
their influence on the shapes of phenotypic distributions alone. By 
all these criteria our shape-based analysis appears to be well suited 
to probe the entire spectrum of genes involved in complex cellular 
processes, covering genes with both subtie and strong effects across 
a variety of phenotypes. 

A 'cell state' model for feature distribution shape 

changes 

We have seen that population distributions change shape when a 
perturbation is applied; this suggests that different cells in the 
population might be in distinct states, causing them to respond 
differently to the perturbation. We can define this hidden 'cell state' 
[35] to be a feature with the following properties: (1) it is not itself 
affectc'd l)y a perturl)ation; (2) it can be used prc'dict the rc'sponsc' of 
some otlier feature to that perturbation. We refer to such a cell-state 
feature as a "classifier", and to the feature being perturbed as the 
"output". We searched for potential classifier-output pairs among 
the 1 2 intensity features, as well as the cell-size feature (geometric 
features had narrow or discrete distributions, and were therefore 
poor candidates for classifiers). Cells were sorted into three classifier 
percentile bins (A: 5-%-15%, B: 45%-55%, C: 85%-95%) and the 
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Figure 5. Comparing shape-based scoring to other scoring 
methods. (A) The difference in the true-positive rate TP^-TPu 
represents the increase in performance derived from normalization. 
Upper panels: black histograms show the distribution of TP^-TPy values 
for 25 features (excepting Gil and G15); colored bars represent the 
fraction of intensity (orange) and geometric (purple) features in each 
bin. The left panel shows the observed performance for positive 
controls; the right panel shows the inferred performance over all genes. 
Lower panels: same as upper, but now the normalization strategy (TPn) 
is compared to partial normalization (TP-p), when bins are normalized to 
have the average mean and variance of their eight nearest neighbors. 
(B) The 'traditional' Z-score is defined based only on the mean values of 
feature distributions. The figure shows the cumulative distributions of 
traditional Z-score values for genes that have been validated as hits for 
intensity features (orange) or geometric features (purple) in an 
independent experimental assay. Less than 10% of these Z-scores have 
an absolute value greater than unity. 
doi:1 0.1 371/journal.pone.0090540.g005 



mean value of the output was calculated for each bin {mj^, m^, mc)- 
These were used to define a correlation score: )) = (ma - ms)/ (™b^ 



mc). For a given classifier-output candidate pair { i, j }, we 
calculated the median value ofjc over all positive control wells, and 
subtracted from it the median value over all negative control wells, 
to get the final score A)i,j. This 13x13 matrix is shown as a heatmap 
(Fig. S4A in File SI) separately for the positive controls Arfl, Shi, 
and Sec23. A feature j will be a good candidate for a classifier if the 
diagonal entry A^,, is close to zero (since it must be unaffected by the 
perturbation); a corresponding featurej is a good candidate for the 
output if the entry Ayij is high in magnitude, which occurs when the 
three binned populations respond differentially to RNAi. One 
candidate pair is shown in the Shibire heatmap (Fig. S4A in File SI): 
the classifier is cell size (Gl 1) and the output is the intensity feature 
19. To explore this classifier-output pair further, we carried out 
pairwise KS-tests of Shibire-RNAi wells against negative control 
weUs, for both 19 and Gl 1 (Fig. S4B in File SI). It is clear diat 19 is 
strongly affected by the perturbation, but Gil is not. The same 
result was evident when we pooled data from group of similar wells 
(Fig. S4B in FUe SI, red box) and examined the distributions of Gl 1 
and 19 witliout (Fig. S4C in File S 1 , grey histogram) and with (Fig. 
S4C in File SI, blue histogram) the RNAi perturbation. We next 
split the cells into five classifier percentile bins, and examined the 
mean value of the output in each bin without (Fig. S4C in File S 1 , 
grey curve) and with (Fig. S4C in File SI, blue curve) RNAi. We 
found that the response of the output to RNAi depended strongly on 
cell size. This implies that the variation in response is at least partly 
due to the background variation in cell size; however, most of the 
variation remains unexplained. This is not surprising, since we have 
only tested a handful of features for explanatory value. 

Discussion 

Cell biologists, when limited by the sensitivity of their assays, are 
sometimes forced to pool data from large numbers of cells. For 
example, transcriptional, proteomic, or metabolic analyses require 
large sample sizes in order to enhance signal-to-noise and thus 
ensure reproducibility. Behind the interpretation of such measure- 
ments is the tacit assumption that population averages accurately 
reflect the state of individual cells. However, as single-ceU-resolved 
data have become more widely available, the "myth of the average 
cell" has been found to be a poor reflection of reality [1] . CeU-to-ceU 
variability is ubiquitous, and population averages blur much of the 
complexity of ceU-biological phenomena. Several elegant studies 
have begun to reveal details about the processes which give rise to 
this variability. Here we have taken a complementary approach, 
using cell-to-ceU variability itself as the ceU-biological probe. Rather 
than relying on traditional population-averaged measures to detect 
perturbations, this is precisely the information we ignore - we 
normalize aU our phenotypic distributions to have zero mean and 
vmit variance, and focus purely on shape. This operation neutralizes 
the scale and background artifacts inherent in the dye-labeling and 
imaging approaches central to many cell biological assays. But there 
is a danger that normalization might cause initiaUy distinct 
distributions to collapse onto a single curve, thus throwing the 
baby out with the bathwater. Our key finding is that, for a broad 
range of phenotypes, normalized distributions do not collapse; 
instead, perturbations cause robust changes to their shapes. By 
detecting these changes, we can screen for candidate genes involved 
in a variety of cellular processes. 

As a practical matter, the fact that we have been able to exploit 
ceU-to-ceU variability in this way without needing to understand its 
mechanistic basis implies that our methods are broadly applicable. 
However, our results also raise questions of a fundamental nature. 
What we observe as the individuality of ceUs in their response to 
perturbations might be intrinsically stochastic; but it might also 
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Figure 6. Results and biological significance. (A) We validated hits using an independent experimental assay based on the population-averaged 
mean value of each phenotype [14]. We carried out this measurement both for hits as well as for a number of non-hits (genes with below-threshold 
Z-scores). The y-axis shows the fraction of original hits validated (FP = 0.1); the x-axis shows the fraction of sub-threshold genes validated. Each dot 
gives the result for a single feature type; hits for features Gil and G15 were not included in the secondary measurement. Horizontal and vertical 
dotted lines show the FP rate. The distinction between hits and sub-threshold genes was based on shape-based scoring alone, but the former are 
detected at a much higher rate than the latter using the population-average-based assay. This demonstrates a strong correlation between the ability 
of a gene to influence the mean value of a phenotype and the shape of a phenotypic population distribution. (B) Relationships of hit subsets to cell 
density. We show the cell number per imaging field as a vertical histogram. The median (horizontal line), mean (box), and percentiles (5%, 25%, 75%, 
95%) of the cell number distribution are overlaid. Histograms are separately shown for negative control wells, all test wells, and for fluid uptake. 
Transferrin uptake, and nuclear morphology hits. (C) Area-proportional Venn diagram of hits that influence fluid-phase uptake (F), Transferrin- 
receptor-mediated uptake (T), or nuclear and cell morphology (N). Of the 26 genes that influence cell size, 21 which do not influence other features 
have been omitted. Numbers give the sizes of non-overlapping subsets. The total number of hits is 1051 (shown) + 21 (not shown). (D) Functional 
enrichment. We annotated genes according to the Gene Ontology (GO) 'cellular component' classification system, using only the most specific term 
for each gene. We used the one-tailed Fisher's exact test to determine an enrichment p-value for each teach GO term among the 1072 hits, given its 
background occurrence among the 7216 RNAi probes. To correct p-values for multiple hypotheses, we used 1000 simulated datasets in which GO 
terms had been randomly permuted. The table shows the seven GO terms with corrected p-values < 0.1, along with the observed and expected 
number of genes among the seven non-overlapping gene subsets of the Venn diagram. 
doi:10.1371/journal.pone.0090540.g006 



reflect a prior heterogeneity of biochemical and biophysical cell states 
in the population, arising fi-om autonomous factors such as 
transcriptional, metabolic, and cytoskeletal variations, or higher- 
order effects involving cell-to-cell communication and coordination. 
Disentangling these possibilities is an important problem in its own 
right: the more we discover about the origins of ceU-to-ceU variability, 
the better will we understand the secret lives of single cells. 



and divide by the standard deviation, resulting in a distribution with 
zero mean and unit variance. We then compare the normalized 
distributions from two wells by using the Kolmogorov-Smirnov (KS) 
test statistic D: the maximum vertical distribution between the 
cumulative distribution functions [32], [33]. If the two sampes have 
jVi and M2- data points, the effective sample size is given by 1 /jV, = 
1/jVi + l/jVj, and the final statistic is: 



Methods 

A Z-score to quantify shape changes of phenotypic 
distributions 

We start with a list of values of some phenotype of interest 
measured in any well. For each data point, we subtract the mean 



D* = ^/%D Eq. 1 

The Z-score itself is calculated by quantifying the shape deviation 
from negative controls. For each negative control well, we 
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calculate its D value against all 300 wells of a sUde; this 300-length 
vector is normalized to have zero mean and unit variance. 
Arranging each such vector as a row of a 30x300 matrix, the 
average of each column represents the shape deviation of the 
corresponding well. In practice we first removed the 10 worst- 
scoring negatives so that we could use them to estimate false- 
positive rates. This left a 20x300 matrix shown as a heat map in 
Figure 3A, with hghter boxes indicating higher shape deviations. 
The Z-score of any well is the average value of the corresponding 
column; the higher the score, the greater the shape deviation. 

Assessing statistical power from triplicate data 

Each gene is tested in triplicate, and therefore assigned three Z- 
scores. At a given Z-score threshold, genes can be classified into 
four bins: occurring zero, one, two, or three times above that 
threshold. Each bin contains some to-be-determined combination 
of negatives and true hits. We assume a fraction h of all tested 
genes to be true hits. At a frxed threshold, we assume hits to have a 
Gaussian distribution of false-negative (FN) rates with mean ySg 
and variance a , while negatives have a false-positive (FP) rate a. 
Under these assumptions, genes fall into triplicate bins as follows: 



bin hits non — hits 



where the brackets •(•••) denote averages over the distribution 
of P values, easily computed under the Gaussian assumption. We 
next estimate the values of {a,flQ,a,h} which best describe the 

observed triplicate bin data. At each threshold, four normalized 
bins correspond to three degrees of freedom, while we are trying to 
estimate four unknown parameters. However, since A is a constant 
independent of threshold, the system is "almost" determined, and 
we are able to find a unique solution with a non-zero least-squares 
error. We can thus find {a,pQ,a} as a function of the threshold, as 
well as a fixed value of A, for each parameter (Fig. S3 in File SI). 

Supporting Information 

FUe SI Contains the following tables and figures: Table SI. 
Feature descriptions. Figure SI. Slide layout and positional 
artifacts. Figure 82. Presence of reproducible hits. Figure S3. 
Performance inferred from triplicate data. Figure S4. Cell states 
and cell-to-cell variability. 
(PDF) 
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